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Abstract 

Background: Many eukaryotic genomes encode c/s-natural antisense transcripts (c/s-NATs). Sense and antisense 
transcripts may form double-stranded RNAs that are processed by the RNA interference machinery into small 
interfering RNAs (siRNAs). A few so-called nat-siRNAs have been reported in plants, mammals, Drosophila, and 
yeasts. However, many questions remain regarding the features and biogenesis of nat-siRNAs. 

Results: Through deep sequencing, we identified more than 17,000 unique siRNAs corresponding to c/s-NATs from 
biotic and abiotic stress-challenged Arabidopsis thaliana and 56,000 from abiotic stress-treated rice. These siRNAs 
were enriched in the overlapping regions of NATs and exhibited either site-specific or distributed patterns, often 
with strand bias. Out of 1,439 and 767 c/'s-NAT pairs identified in Arabidopsis and rice, respectively, 84 and 1 19 
could generate at least 10 siRNAs per million reads from the overlapping regions. Among them, 16 c/'s-NAT pairs 
from Arabidopsis and 34 from rice gave rise to nat-siRNAs exclusively in the overlap regions. Genetic analysis 
showed that the overlapping double-stranded RNAs could be processed by Dicer-like 1 (DCL1) and/or DCL3. The 
DCL3-dependent nat-siRNAs were also dependent on RNA-dependent RNA polymerase 2 (RDR2) and plant-specific 
RNA polymerase IV (PollV), whereas only a fraction of DCL1 -dependent nat-siRNAs was RDR- and PollV-dependent. 
Furthermore, the levels of some nat-siRNAs were regulated by specific biotic or abiotic stress conditions in 
Arabidopsis and rice. 

Conclusions: Our results suggest that nat-siRNAs display distinct distribution patterns and are generated by DCL1 
and/or DCL3. Our analysis further supported the existence of nat-siRNAs in plants and advanced our understanding 
of their characteristics. 
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Background 

Gene regulation through small non-coding RNAs has 
been recognized as an important mechanism for almost 
all cellular processes in eukaryotes. Most small RNAs 
(sRNAs) are processed by RNase Ill-type ribonuclease 
Dicer-like (DCL) proteins, and incorporated into Argo- 
naute (AGO) proteins to guide gene silencing at the 
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transcriptional level through DNA or histone modifica- 
tion, and at the posttranscriptional level by mRNA clea- 
vage and degradation or translational repression [1-5]. 
Based on their precursor structures and biogenesis, 
sRNAs can be categorized into microRNAs (miRNAs) 
and small-interfering RNAs (siRNAs). miRNAs are 
derived from hairpin-structured precursors and do not 
depend on RNA-dependent RNA polymerases (RDRs) 
or RNA amplification mechanisms, whereas endogenous 
siRNAs (endo-siRNA) are generated from long double- 
stranded RNAs (dsRNAs) as a result of antisense tran- 
scription or the activity of plant-specific RNA poly- 
merases such as RNA polymerase (Pol)IV and RDRs 
[6-8]. 
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Different classes of endogenous siRNAs have been 
described in Arabidopsis based on their distinct charac- 
teristics and biogenesis pathways [3,9]. The 24-nucleotide 
heterochromatic siRNAs (hc-siRNAs) or repeat-asso- 
ciated siRNAs (ra-siRNAs) are generated by the PolIV/ 
RDR2/DCL3 pathway and function in RNA-directed 
DNA methylation and histone modifications [2,10-12]. 
Trans-acting siRNAs (ta-siRNAs) are 21 nucleotides in 
length and require DCL4, RDR6, and miRNA-mediated 
cleavage that triggers the phased processing [3]. Long 
siRNAs (IsiRNAs) are 30 to 40 nucleotides in length and 
are dependent on DCL1, HEN1, RDR6 and PolIV [13]. 
IsiRNAs most likely induce decapping and 5' to 3' mRNA 
degradation of their targets [13]. Several as-natural anti- 
sense transcripts (cz's-NATs) have been reported to also 
generate siRNAs [14-18]. These so-called nat-siRNAs are 
induced by abiotic and biotic stresses [16,17,19] or accu- 
mulate in specific developmental stages [14,15]. The bio- 
genesis of salt- and bacterium-induced nat-siRNAs 
appeared to require DCL1 and/or DCL2, RDR6, and 
PolIV in Arabidopsis [16,17]. Moreover, the expression of 
ARIADNE14 is de-repressed in dell, henl, hyll, sde4, 
rdr2 and sgs3, suggesting that the nat-siRNAs generated 
from the ARIADNE1 4/KOKOPELLI overlapping pair is 
dependent on DCL1, HEN1, HYL1, RDR2, SGS3 and 
PolIV [14]. The overlapping regions of some cw-NATs in 
mouse and Drosophila also produce endo-siRNAs, which 
are processed by Dcr-2 and loaded into AG02 [20-24]. 
nat-siRNAs have also been found in budding yeast and 
Schistosoma japonicum [25,26]. Despite these reports, 
there remains a need for conclusive evidence for the 
occurrence of nat-siRNAs in plants [27], because their 
features are not well understood and a general model for 
their generation is still not available. 

Here we report the identification of 33,826 and 
162,397 siRNAs corresponding to 17,141 and 56,209 
unique sequencing reads that were derived from cis- 
NATs of biotic and abiotic stress-challenged Arabidopsis 
and abiotic stress-treated rice, respectively. These siR- 
NAs, displaying either a distributed or a site-specific 
pattern, were enriched in the overlapping regions and 
often exhibited strand bias. Genetic analysis revealed 
that nat-siRNAs in Arabidopsis were mainly generated 
by DCL1 and/or DCL3, and a subgroup of them was 
RDR- and PolIV-dependent. Furthermore, accumulation 
of some nat-siRNAs was regulated by biotic and abiotic 
stress conditions, which may contribute to plant 
responses to environmental stresses. 

Results 

Natural antisense transcripts and c/s-NAT-derived siRNAs 
in Arabidopsis and rice 

We sequenced a total of 21 sRNA libraries prepared 
from biotic and abiotic stress-treated Arabidopsis and 



abiotic stress-challenged Oryza sativa plants. Biotic 
stress datasets for Arabidopsis included infection of 
non-pathogenic Pseudomonas syringae pv tomato (Psi) 
DC3000 hrcC, a virulent strain Pst DC3000 carrying an 
empty vector, and an avirulent strain Pst DC3000 carry- 
ing an effector gene, avrRpt2, as previously described 
[28-30]. Abiotic stresses on Arabidopsis and rice 
included salt, cold and drought treatments (see Materi- 
als and methods). We obtained a total of more than 
24.6, 23.2 and 30.9 million raw sequencing reads from 
these biotic and abiotic stress-treated Arabidopsis and 
abiotic stress-treated rice small-RNA libraries, respec- 
tively (Additional file 1). After trimming adaptor 
sequences and discarding reads shorter than 17 nucleo- 
tides or longer than 28 nucleotides or of low quality, we 
obtained 13,985,938, 14,664,923 and 17,446,592 reads 
from the biotic and abiotic stress-treated Arabidopsis 
libraries and the abiotic stress-treated rice libraries that 
perfectly match to the Arabidopsis and rice genomes or 
cDNAs, respectively (Additional file 1). Among all the 
small-RNA species profiled, 24-nucleotide sRNAs were 
the most abundant size class (Additional file 2a). The 5'- 
first nucleotides of sRNAs preferentially are adenosine 
or uracil (Additional file 2c). In addition to a large num- 
ber of miRNAs [28,29], we discovered many new endo- 
genous siRNAs belonging to different siRNA categories. 
In this study, we focused on siRNAs that correspond to 
cw-NATs. 

Using the Arabidopsis genome annotation (TAIR/ 
NCBI version 8.0) and the rice genome annotation 
(MSU RGAP 6.1), we searched for pairs of genes in Ara- 
bidopsis and rice that overlapped more than 25 nucleo- 
tides at the same genomic loci. We thus identified 1,439 
and 767 pairs of ci's-NATs in Arabidopsis and rice, 
respectively (Table 1). cz's-NATs can be further categor- 
ized into three groups: convergent (3'-3' overlap), diver- 
gent (5'-5' overlap), and enclosed, in which one 
transcript is entirely encompassed by the other. Most 
cz's-NATs (1,126 and 560 pairs in Arabidopsis and rice, 
respectively) are arranged in convergent orientation 
(Table 1). We excluded 65 and 10 cis-NAT pairs in 
which at least one gene encodes rRNA, tRNA, small 
nuclear RNA (snRNA), small nucleolar RNA (snoRNA), 
miRNA, ta-siRNA, or transposons in Arabidopsis and 
rice, respectively, from subsequent analyses. These RNA 
genes or transposons are considered 'hot spots' of RNA 
degradation or small RNA generation and may mask the 
real distribution of nat-siRNAs. We also analyzed the 
NAT genes targeted by miRNAs to eliminate the possi- 
bility that the sRNAs that matched to the transcripts are 
secondary siRNAs triggered by 22-nucleotide miRNAs 
[31,32]. We found that At2g33810 in the At2g33810/ 
At2g33815 pair was a target of miR156 and Atlg53230 
in the Atlg53230/Atlg53233 pair was a target of 
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Table 1 Different types of c/s-NATs identified in Arabidopsis and rice and c/s-NATs that generated c/s-nat-siRNAs 



Type of c/s- 
NAT pairs 


All 
Mil 

pairs 


All nsirc 

Mil pairs 
analyzed 


rdllb Willi > IU SlnlNrt 

reads in OR (%) a 


rdlib Willi blmNMb 

only in the OR (%) b 


rails Willi blnlMrtb ->Z lOlu 

enriched in OR (%) b 


rdlib Willi blnlNMb OI 

strand bias (>2-fold) (%) b 


A^obidopsis 














Convergent 


1,126 


1,103 


23 (24%) 


5 (21.7%) 


9 (37.5%) 


13 (54.2%) 


Divergent 


142 


127 


22 (17.3%) 


4 (18.2%) 


5 (20.0%) 


11 (44.0%) 


Enclosed 


171 


144 


39 (27.1%) 


7 (17.9%) 


1 0 (23.8%) 


22 (52.4%) 


Total 


1,439 


1,374 


84 (6.1%) 


16 (19.0%) 


24 (26.4%) 


46 (50.5%) 


Rice 














Convergent 


560 


551 


68 (12.3%) 


20 (29.4%) 


31 (45.6%) 


36 (52.9%) 


Divergent 


100 


99 


12 (12.1%) 


1 (8.3%) 


3 (25.0%) 


4 (33.3%) 


Enclosed 


107 


107 


39 (36.4%) 


1 3 (33.3%) 


11 (28.2%) 


23 (59.0%) 


Total 


767 


757 


119 (15.7%) 


34 (28.6%) 


45 (37.8%) 


63 (52.9%) 



OR, overlap region of a c/s-NAT transcript pair. a The percentage of c/s-NAT pairs in all c/s-NAT pairs analyzed. b The percentage of the specified c/s-NAT pairs in 



the corresponding regions with siRNAs. 



miR319. However, neither miR156 nor miR319 is 22 
nucleotides, and the siRNAs generated from At2g33810 
and Atlg53230 were not arranged in a phasing pattern, 
which is a characteristic pattern of miRNA-triggered 
secondary siRNAs. Therefore, we retained At2g33810/ 
At2g33815 and Atlg53230/Atlg53233 in our list. A 
total of 1,374 NAT pairs in Arabidopsis were used in 
this study. Among them, 186 genes (6.8% of 2,748 NAT 
genes) are annotated as 'other RNAs'. They are most 
likely non-coding RNAs or RNAs with potential to 
encode very short proteins or peptides. 

We searched our small RNA deep-sequencing libraries 
for reads that matched perfectly (that is, no mismatches) 
to the c/s-NATs. We found that 84 (6.1% of 1,374) and 
119 (15.7% of 757) of the Arabidopsis and rice c/s-NAT 
pairs, respectively, gave rise to at least 10 sRNAs per 
one million total genome-mapped reads from the over- 
lapping regions of the c/s-NATs in all the libraries 
(Table 1; Additional file 3). Interestingly, among the 84 
Arabidopsis NAT pairs, 54 pairs have one transcript 
(32.1% of 168 genes) annotated as 'other RNAs' (Addi- 
tional file 4), suggesting that non-coding antisense 
RNAs are more likely to generate siRNAs. A total of 
33,826 and 162,397 siRNAs corresponding to 17,059 
and 56,209 unique reads were derived from c/s-NATs in 
Arabidopsis and rice, respectively (Table 2). As shown 
in Additional file 2, nat-siRNAs are mainly 21 and 24 
nucleotides in length, whereas in the total small RNA 
population, the 24-nucleotide siRNAs are the most 
abundant species. The first-nucleotide distribution for 
the nat-siRNAs was similar to that of the total sRNAs 
(Additional file 2). 

Within the three types of c/s-NATs, enclosed c/s-NATs 
have the highest percentages for producing siRNAs, that 
is, 39 (27.1% of 144) and 39 (36.4% of 107) of the 
enclosed c/s-NATs in Arabidopsis and rice, respectively, 
generated siRNAs (Table 1). Among the 84 and 119 cis- 
NAT pairs that produce more than 10 siRNAs per 



million reads in the overlapping regions in Arabidopsis 
and rice, respectively, 16 and 34 pairs (19.0% and 28.6%) 
have siRNAs exclusively in the overlapping regions 
(Table 1, Figures la-i and 2a). To determine if siRNAs 
were truly enriched in the overlapping regions of NATs, 
we took into consideration the length differences 
between the overlapping and non-overlapping regions of 
c/s-NATs by comparing the densities of siRNAs in these 
two regions. A statistical test showed that the density of 
siRNAs in the overlapping regions was more than six 
times that in the non-overlapping regions, with P-values 
of 0.0077 and 0.0242 in Arabidopsis and rice, respec- 
tively (see Materials and methods). To further explore 
whether c/s-NATs have a higher likelihood to give rise 
to siRNAs than other protein-coding genes, we com- 
puted the densities of sRNAs in the 3'-UTR overlapping 
regions of convergent c/s-NATs and 5'-UTR overlapping 
regions of divergent c/s-NATs, and the siRNA densities 
in the 3'-UTRs and 5'-UTRs of protein-coding genes 
that were not involved in NATs, respectively (see Mate- 
rials and methods). We then compared the density of 
siRNAs in the convergent c/s-NATs with the density of 
siRNAs in the 3'-UTR of non-NAT genes in Arabidop- 
sis; similarly, we compared divergent c/s-NATs with the 
5'-UTR of non-NAT genes. The densities of siRNAs in 

Table 2 The numbers of nat-siRNAs from the entire 
regions of c/s-NATs, overlap regions of c/s-NATs, introns 
in entire regions of c/s-NATs and introns in overlap 
regions of c/s-NATs in Arabidopsis and rice 

Arabidopsis Rice 



Origin of nat-siRNAs 


Total 


Unique 


Total 


Unique 


Entire NATs 


33,826 


17,059 


162,397 


56,209 


Introns in entire NATs 


14,500 


6,988 


88,605 


29,654 


OR of NATs 


1 0,429 


6,532 


27,852 


10,761 


Introns in OR of NATs 


4,367 


2,350 


14,207 


5,494 



Raw reads of siRNAs that perfectly mapped to the Arabidopsis and rice 
genomes were analyzed here. OR, overlap region. 
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Figure 1 Distinct distribution patterns of Arabidopsis nat-siRNAs. (a-i) Arabidopsis nat-siRNAs displaying a distributed pattern and exclusively 
derived from the overlapping regions, (j-o) Arabidopsis nat-siRNAs displaying a distributed pattern but derived from both the overlap region and 
non-overlap region. sRNAs matching the upper and lower strands are displayed above and below the NAT pairs, respectively. The red regions on 
the gene model represent exons, whereas the blue regions represent introns. The region between the green lines represents the overlapping 
region of the NATs. siRNAs probed are indicated by black arrows. 



Zhang et al. Genome Biology 2012, 13:R20 
http://genomebiology.com/201 2/1 3/3/R20 



Page 5 of 16 



5' OS03G02762 



OS03G02770 



Total reads: 50, o crlap region reads: 50 
Sense reads: 14, ; ntisense reads: 36 




OS08G06220 
3' 



OS08G06230 



re; ids: 745, overlap region reads: 633 
: rqads: 433, antisense reads: 312 



I reac s 



Tolal 

overlap rdgion 
Sense re; 



antisense 

I 



OS04G57160 



<)S04(i57140 



580, 

reads: 390 
248. 
reads: 332 



a Is 



Figure 2 Distribution of rice nat-siRNAs with distribution pattern OS03G02762/OS03G02770 (a) and OS08G06220/OS08G06230 (b) or 
site-specific pattern OS04G57140/OS04G57160 (c). Strands above or under the NAT pairs represent sRNAs positively or negatively mapping 
to the upper strands. In the gene model, exons and introns are indicated by red regions and blue regions, respectively. The overlapping regions 
of the NATs are indicated by green lines. 



divergent 5' and convergent 3' cw-NATs were statisti- 
cally greater than that of the 5'- and 3'-UTRs of non- 
NAT genes, with P-values of 0.0366 and 0.0438, respec- 
tively. These results confirmed that cw-NATs were more 
likely than non-NAT genes to generate siRNAs. This is 
further supported by the study of Argonaute-associated 
sRNAs, which indicated that cw-NATs generated 2.29 
times more sRNAs than other gene transcripts [33]. Our 
results were also consistent with that of animal endo- 
siRNAs, which are also enriched in the overlapping 
regions of c/s-NATs, for example, more than ten-fold 
enrichment in Drosophila [21-23,25,34]. 

The distribution of siRNAs along the cw-NATs display 
two distinct patterns, one is the distributed pattern with 
siRNAs scattered along the overlapping regions or 
across the entire transcripts (Figures 1 and 2a, b), and 
the other is the site-specific pattern with siRNAs derived 
from one or a few specific sites (Figures 2c and 3) (see 
Materials and methods for additional details). We iden- 
tified 9 site-specific patterns from Arabidopsis and 23 
from rice (Additional files 5 and 6). Because RNA sec- 
ondary structure analysis at these specific sites using 
RNAFold [35] revealed no obvious stem-loop structures, 



these regions were unlikely to harbor new miRNA genes 
or nat-miRNA genes as described [36], although we are 
aware of the limitation of RNAfold. Moreover, 46 and 
63 (50.5% and 52.9%) c/s-NAT pairs in Arabidopsis and 
rice, respectively, exhibited strand bias in generating 
nat-siRNAs with different directions (with more than 
two-fold change) (Table 1). 

Requirement of DCL1 and/or DCL3 for the accumulations 
of nat-siRNAs 

To experimentally characterize the nat-siRNAs, we 
examined the expression of 15 new nat-siRNAs identified 
in Arabidopsis with relatively high numbers of reads. In 
addition to nat-siRNAATGB2, another 6 out of the 15 
nat-siRNAs that we examined showed clear siRNA sig- 
nals by small RNA Northern blot analysis using locked 
nucleic acid (LNA) probes (indicated by black arrows in 
Figures 1 and 3; probes used are listed in Additional file 
7), which enabled us to study their biogenesis by examin- 
ing their expression in small RNA pathway mutants. The 
biogenesis of the site-specific nat-siRNA generated from 
the divergent pair Atlg51400/Atlg51402 (referred to as 
nat-siRNAlg51400; Figure 3c) was dependent on DCL1 
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Figure 3 Site-specific patterns of Arabidopsis nat-siRNAs At4g35850/At4g35860 (a), At2g33793/At2g33796 (b), and At1g51400/ 
At1g51402 (c). siRNAs that positively or negatively map to the upper genes are present above or under the gene pair, respectively. The introns 
of c/s-NATs are represented by blue strands and overlapping regions of c/5-NATs are shown by green lines. The black arrow represents the 
siRNAs detected by Northern blot. 
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and HEN1, but did not require any RDRs or PolIV 
(Figure 4a). Similarly, the nat-siRNA generated from the 
convergent pair At2g41590/At2g41600 with a distributed 
pattern (referred to as nat-siRNA2g41590; Figure lj) was 
also DCLl-dependent but RDR- and PolIV-independent 
(Figure 4b). Both nat-siRNAs were 21 nucleotides in 
length. Previous studies have shown that the biogenesis 
of bacterial-induced nat-siRNAATGB2, and the accumu- 
lation of some of the salt-induced nat-siRNASR05 and 



development-associated nat-siRNAKPL also depended on 
DCL1 [14,16,17]. 

We found that the convergent NAT pair Atlgl3330/ 
Atlgl3340 (Figure lo) could produce two classes of siR- 
NAs, 21-nucleotide and 24-nucleotide siRNAs (referred 
to as nat-siRNAlgl3340) from the same site (Figure 5a). 
Genetic analysis revealed that the 21-nucleotide siRNAs 
were DCLl-dependent, whereas the 24-nucleotide siR- 
NAs were DCL3-dependent. Furthermore, we found 
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Figure 4 Biogenesis of nat-siRNAs that depend on DCL1. (a-c) Accumulation of nat-siRNA1 g51400 (a), nat-siRNA2g41590 (b) and nat- 
siRNA4g34070 (c) is shown in various del, rdr, pollV and polV mutants. Total RNA (75 to 100 ug) was used for Northern blot analysis. LNA probes 
complementary to each nat-siRNA were used. U6 was an internal control to show equal loading. The same results were obtained from two 
biological replicates. 
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Figure 5 Biogenesis of nat-siRNAs that depend on DCL3. (a-c) Accumulation of nat-siRNA1g13340 (a), nat-siRNA1g28100 (b) and nat- 
siRNAIgl 7745 (c)Total RNA (75 to 100 ug) of distinct del, rdr, pollV and polV mutants and corresponding WT are probed by LNA probes 
indicated in Figures 1 and 3. U6 was used as a loading control. Similar results were obtained in two biological repeats. 



that two other 24-nucleotide nat-siRNAs with a distrib- 
uted pattern, nat-siRNAlg28100 generated from NAT 
pair Atlg28100/Atlg28110 (Figure 11) and nat-siR- 
NAlgl7745 generated from NAT pair Atlgl7744/ 
Atlgl7745 (Figure In), were also dependent on DCL3 
(Figure 5b, c). Interestingly, all the DCL3-dependent 24- 
nucleotide nat-siRNAs examined are dependent on 
RDR2 and PollV (Figure 5), while only some of the 21- 
nucleotide nat-siRNAs are dependent on RDRs and 
PollV [14,16,17]. Deep sequencing results showed that 
82 out of the 84 ds-NAT pairs gave rise to two classes 
of siRNAs, 20- to 22-nucleotide and 23- to 28-nucleo- 
tide siRNAs in the overlapping regions, while the 
remaining two pairs gave rise to the 20- to 22-nucleo- 
tide class only (Additional file 4). Thus, our results 



suggested that the ds-NATs have the potential to be 
processed by both DCL1 and DCL3. 

nat-siRNAs generated from introns 

In both plants and animals, nat-siRNAs are derived 
from NAT mRNAs and mainly function post-transcrip- 
tionally [16,17,23]. This was also supported by our 
observation that some NATs with distributed nat-siR- 
NAs predominantly appeared in exonic regions of pro- 
tein coding genes, for example, At2g33810/At2g33815, 
Atlgl3330/Atlgl3340, OS08g06220/OS08g06230, and 
At4g35850/At4g35860 (Figures lg, o, 2b and Figure 3a). 
However, based on the Arabidopsis genome annotation 
(TAIR version 8), we found that 383 of the 1,374 ds- 
NAT pairs (27.9%) contain introns in the overlapping 
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regions. Among them, 45 (11.7%) have siRNAs mapped 
to introns. Examples are shown for At2g41590/ 
At2g41600, AT434070/At4g34071 and At2g02795/ 
At2g02800 (Figure lj, k, m). In rice, 346 of 757 (45.7%) 
cw-NAT pairs contain introns in the overlapping 
regions; one example is shown for OS04g57140/ 
OS04g57160 in Figure 2c. A substantial number of nat- 
siRNAs originated from such introns within NATs, as 
listed in Table 2 and Additional files 8 and 9. Similarly, 
it has been reported that a large number of siRNAs are 
also generated from the introns of several c/s-NATs in 
Drosophila [23]. It is possible that the intronic nat-siR- 
NAs are generated from pre-mRNAs in the nucleus 
before the introns are spliced out. 

To confirm these intron-derived nat-siRNAs, we vali- 
dated the candidates in small RNA biogenesis mutants. 
Although the total number of siRNAs from the entire 
overlapping regions might be large, the reads for indivi- 
dual siRNAs were still low. Most of the siRNAs exam- 
ined were below the detection level of Northern blot 
analysis except one siRNA that is derived from an 
intron-exon junction of At4g34070 in the overlapping 
region of the divergent NAT pair At4g34070/At4g34071 
(Figure lk). We detected a distinct 30-nucleotide siRNA 
band using a 24-nucleotide probe. Because we used a 
18- to 26-nucleotide size-fractionated small-RNA popu- 
lation for constructing our small-RNA libraries, IsiRNAs 
were excluded from our small-RNA sequencing dataset. 
To determine whether it was a new IsiRNA like the 
ones we detected before [13], or just a degradation pro- 
duct, we examined the dependence of this 30-nucleotide 
siRNA on components of the small RNA biogenesis 
pathways. We found that the generation of this siRNA 
depended on DCL1 and HEN1, but not on DCL3, or 
any RDRs, PolIV or PolV, indicating that it was a new 
IsiRNA but not a hc-siRNA (Figure 4c). It is worth not- 
ing that the reported AtlsiRNA-1 is also dependent on 
DCL1 and HEN1 [13]. These results suggested that a 
subgroup of nat-siRNAs was likely to be generated from 
pre-mRNAs. 

Some nat-siRNAs regulate the expression of their cognate 
NAT mRNAs in cis 

We found that siRNAs from some cis-NATs accumu- 
lated to different levels in different libraries (Additional 
files 10 and 11). Eighty ci's-NAT pairs in Arabidopsis 
and 37 cw-NAT pairs in rice displayed a more than 
two-fold difference of siRNA accumulation in the over- 
lap regions under at least one stress condition as com- 
pared with mock-treated samples. This result suggested 
that these nat-siRNAs were likely to regulate expression 
of the NATs under stress conditions, although more 
experiments are needed to validate and confirm these 
changes. 



To determine the regulatory function of nat-siRNAs 
on their NAT transcripts in cis, we performed computa- 
tional analysis using dcll-7 and dcl3-l (inflorescence tis- 
sue) microarray data generated by the Carrington lab 
[37] (see Materials and methods), because DCL1 and 
DCL3 are responsible for nat-siRNA biogenesis. Of the 
84 pairs of cw-NAT genes, 93 genes from 78 pairs were 
present on the 22-K microarray chip. Of the 93 ds-NAT 
genes, 22 (23.7%) exhibited a greater than 1.5-fold up- 
regulation in the dcll-7 mutant compared to the wild 
type (WT) control (P-value < 0.05; Additional file 12). 
As a comparison, only 176 out of 2,215 (7.9%) total 
NAT genes were up-regulated. Although we cannot 
absolutely rule out the possible small RNA-independent 
function of DCL1 on the accumulation of miRNA tar- 
gets [38], the result that more siRNA-associated NATs 
(23.5%) were regulated by DCL1 than total NATs (7.9%) 
strongly suggests that these small RNA-producing cis- 
NATs were more prone to down-regulation by DCL1- 
dependent nat-siRNAs. However, we did not find any of 
the 93 NAT genes that were up-regulated more than 
1.5-fold in dcl3, suggesting that the DCL3-dependent 
nat-siRNAs may not be directly involved in the expres- 
sion regulation of the NAT transcripts. 

To confirm the results from the microarray analysis 
and to examine the expression of some siRNA-targeting 
NAT transcripts that were not present on the chip or 
were not identified from the analysis, we used real-time 
RT-PCR to examine their expression in the WT, dcl3-l 
and dcll-7 fwfl double mutant plants. We chose to use 
the double mutant instead of the dcll-7 single mutant 
because the fwfl mutant, which carries a mutation in 
the ARF8 gene [39,40], largely rescued the pleiotropy 
and fertility defects of dcll-7 [41,42]. We found that the 
accumulation of Atlg51402 and Atlgl3330 increased 
six and four times more, respectively, in the dcll-7 fwfl 
mutant than in WT plants (Figure 6), indicating that the 
antisense transcripts were no longer suppressed when 
nat-siRNAlg51400 and nat-siRNAlgl3340 were not 
produced. For At2g41590/At2g41600 and At4g34070/ 
At4g34071 pairs that produce siRNAs from both strands 
and have less strand bias, both sense and antisense tran- 
scripts were up-regulated in the dcll-7 fwfl mutant 
compared to the WT, suggesting that both transcripts 
were under the regulation of nat-siRNAs (Figure 6). 
Although it has been shown that the arf8 mutation does 
not interfere with miRNA-mediated gene silencing or 
RNAi or small RNA generation [39,40], it is still possible 
that up-regulation of these NAT transcripts in the dcll- 
7 fwfl double mutant is due to the fwfl mutation that is 
sRNA-independent. To test this possibility, we also 
examined the expression levels of these NAT genes in 
the fwfl single mutant. As shown in Additional file 13, 
in the fwfl single mutant, there was no clear induction 
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of these NAT transcripts except for Atlgl3340. Both 
transcripts of the Atlgl3340/Aglgl3330 pair were 
induced in the dcll-7/fwf2 double mutant and 
Atlgl3340 was induced to a similar extent in both sin- 
gle and double mutants (about two-fold), suggesting 
that the up-regulation of Atlgl3340 was mainly due to 
the arf8 mutation whereas induction of Atlgl3330 was 
attributable to the de-repression of small RNA-mediated 
silencing by the mutation in DCL1. sRNAs generated 
from this pair had a strong strand bias (Figure lo), were 
predominantly generated from Atlgl3340 and targeted 
Atlgl3330. These results correlated well with our real- 
time RT-PCR results. In summary, these data indicated 
that the expression of NAT transcripts was regulated by 
the small RNAs generated from DCL1. 

We found that only three out of the ten transcripts 
that we tested showed an increase in the dcl3-l mutant 
compared to WT (Figure 6), suggesting that some 24- 
nucleotide nat-siRNAs are still able to regulate the 
expression of NAT transcripts, although a majority of 
the NATs were not affected by the 24-nucleotide 
siRNAs. 

To further analyze the function of these nat-siRNAs, 
we searched the publicly available deep sequencing data- 
sets of the AGO-associated small-RNAs in Arabidopsis 
for the nat-siRNAs in the overlap regions we identified 
[43,44]. As shown in Additional file 14, the 20- to 22- 
nucleotide nat-siRNAs were mainly loaded into AG02, 
the Argonaute protein that predominately binds 21- 
nucleotide sRNAs and contributes to anti-viral and anti- 
bacterial resistances [45-48]. Interestingly, the 23- to 26- 
nucleotide long nat-siRNAs were loaded into both 



AGOl and AG04, the Argonaute proteins responsible 
for mRNA cleavage and DNA methylation, respectively 
[49,50]. 

Discussion 

We identified 17,141 and 56,209 unique nat-siRNA 
sequences originating from ci's-NATs in biotic and abio- 
tic stressed Arabidopsis plants and abiotic stress-chal- 
lenged rice, respectively. These siRNAs are enriched in 
the overlapping regions of cis-NAT pairs and display 
distributed or site-specific patterns. 

Our biogenesis analysis suggested that the siRNAs 
from c/s-NATs were dependent on DCL1 and/or DCL3. 
Deep sequencing results showed that 82 out of the 84 
cis-NAT pairs that generated siRNAs in the overlapping 
regions gave rise to two classes of siRNAs, 20- to 22- 
nucleotide and 23- to 28-nucleotide nat-siRNAs, while 
only two pairs of NATs gave rise to one class of 20- to 
22-nucleotide siRNAs (Additional file 4). The 20- to 22- 
nucleotide nat-siRNAs are generated by DCL1, whereas 
the 23- to 28-nucleotide nat-siRNAs were DCL3-depen- 
dent (Figures 4 and 5). We also provide an example that 
a DCLl-dependent 21-nucleotide nat-siRNA and a 
DCL3-dependent 24-nucleotide nat-siRNA were gener- 
ated from the same site in the overlapping region of a 
NAT pair (Figure 5a). These two classes of sRNAs have 
also been observed in transgene silencing, viral RNA 
silencing and endogenous inverted repeat (IR) loci, 
where long double-stranded precursors or intermediates 
are formed [51-55]. Different kinds of transgenic con- 
structs can generate different types of sRNAs: sense 
transgenes mainly generate 21- and 22-nucleotide 
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Figure 6 Expression of the NAT transcripts is increased in dcl1-7 fwf2 and dcl3-1 mutants. Expression was examined by quantitative RT- 
PCR and Actin2 was used as an internal control. Total RNA (5 ug) was treated with DNasel and then subjected to reverse transcription. Error bars 
indicate standard deviations derived from three technical replicates. Similar results were obtained from two biological replicates. 



Zhang et al. Genome Biology 2012, 13:R20 
http://genomebiology.com/201 2/1 3/3/R20 



Page 10 of 16 



siRNAs by DCL2 [56]; IR transgenes generate both 21- 
and 24-nucleotide siRNAs by DCL4 and DCL3, respec- 
tively, or mainly 21-nucleotide siRNAs by DCL4 [55-58]; 
DCL1 and DCL2 are also involved in the generation of 
siRNAs induced by IR transgenes [56]. Similarly, the 21- 
nucleotide, 22-nucleotide and 24-nucleotide endogenous 
siRNAs are processed by DCL4, DCL2 and DCL3, 
respectively, from endogenous IR loci [52,53]. DCL1 is 
required for the accumulation of 21- and 24-nucleotide 
siRNAs processed from IR transgenes or 22- and 24- 
nucleotide siRNAs processed from certain endogenous 
IR loci [53,59]. Viral RNAs are processed into 21- and 
24-nucleotide siRNAs by DCL4 and DCL3, respectively, 
and DCL2 processes viral RNAs into 22-nucleotide siR- 
NAs when DCL4 is suppressed or inactivated [60-64]. 
Furthermore, some endogenous miRNAs in Arabidopsis 
and rice also generate two distinct sizes of DCLl-depen- 
dent short miRNAs and DCL3-dependent long siRNAs 
or miRNAs at the same sites [28,65]. Our results suggest 
that, like the endogenous IR loci, transgene loci and 
viral replicons as well as cis-NAT sense and antisense 
transcripts form double-stranded RNAs that can poten- 
tially be processed by DCL1 and/or DCL3 to generate 
20- to 22-nucleotide short siRNAs and 23- to 28- 
nucleotide long siRNAs, respectively. In this regard, the 
previously reported apparent DCL2 dependence of a 24- 
nucleotide nat-siRNA [17] is intriguing and needs to be 
re-examined. 

By examining the expression of NAT transcripts that 
are complementary to the siRNAs and analyzing pub- 
lished microarray data, we found that many 21-nucleo- 
tide nat-siRNAs were able to regulate the accumulation 
of their target transcripts (Figure 6; Additional file 12), 
which was consistent with the function of reported 21- 
nucleotide nat-siRNAs processed by DCL1 [14,16,17]. 
However, only a small number of NATs targeted by siR- 
NAs were up-regulated in the dcl3 mutant, suggesting 
that only a fraction of the 24-nucleotide class of siRNAs 
may regulate gene expression directly. Similar results 
were observed in the viral-derived 24-nucleotide siR- 
NAs. Although 24-nucleotide viral siRNAs are as abun- 
dant as 21-nucleotide and/or 22-nucleotide siRNAs, 
they can neither silence viral RNAs nor suppress viral 
infection [60-64]. Endogenous 24-nucleotide siRNAs are 
mainly generated by the DCL3/RDR2/PolIV pathway, 
and are often associated with AG04 and AG06 and 
mainly direct DNA methylation or chromatin modifica- 
tion [49,50,54,66]. We found that the generation of the 
DCL3-dependent 23- to 28-nucleotide nat-siRNAs also 
required RDR2 and PolIV. Further studies are necessary 
to determine whether these 23- to 28-nucleotide nat- 
siRNAs can also direct DNA methylation or histone 
modification at their NAT loci. 



We observed two major distribution patterns of nat- 
siRNAs - distributed and site-specific - in Arabidopsis 
and rice. We found that the site-specific nat-siRNAs 
tested were generated by DCL1. Because of its predomi- 
nant role in processing hairpin structures into miRNAs, 
DCL1 is generally assumed to be capable of recognizing 
certain RNA secondary structures. The sense and anti- 
sense transcripts in a NAT pair are transcribed sepa- 
rately, and their overlapping regions may interact to 
form a dsRNA. In this partially dsRNA duplex, complex 
secondary structures may arise, which are recognized by 
the versatile DCL1 enzyme, and thus are processed to 
yield site-specific nat-siRNAs, or possibly distributed 
nat-siRNAs, depending on what secondary structures 
are formed and recognized by DCL1. It is also possible 
that these apparently site-specific nat-siRNAs may be 
generated across the whole overlapping region first as 
the distributed ones, but only specific siRNAs from par- 
ticular sites are stable and thus accumulated, while the 
rest are degraded. 

We showed that RDRs were not always required for 
nat-siRNA formation (Figure 4a-c), which suggested that 
the dsRNAs formed within the overlapping regions of 
the sense and antisense transcripts were sufficient for 
producing nat-siRNAs in such cases. As for the nat-siR- 
NAs that do require RDRs [13,16,17], they were not 
completely eliminated in the rdr mutants. The partial 
dependence on RDRs suggested that these nat-siRNAs 
could be primarily generated from the dsRNA regions 
formed within overlapping sense-antisense transcripts, 
and then may be amplified by RDRs in a secondary 
amplification step. However, because the enrichment of 
c/s-NAT-generated siRNAs compared to all genie siR- 
NAs is relatively weak, one should not automatically 
assume that siRNA clusters corresponding to the cis- 
NAT genes are always caused by the cis-NAT configura- 
tion; it is possible that some siRNAs may coincidentally 
be in a cis-NAT region. Nevertheless, our observation of 
some siRNA clusters that appear exclusively in the over- 
lapping regions of NATs strongly supports that at least 
some siRNAs are indeed caused by the cis-NAT 
configuration. 

We also found that many nat-siRNAs were derived 
from introns or intron-exon junctions. It has been 
shown that the sense/antisense dsRNA pairs in verte- 
brates are processed in the nucleus but not the cyto- 
plasm [67,68]. Therefore, some plant nat-siRNAs may 
also be produced in the nucleus before introns are 
spliced out from the NATs. Moreover, we found a new 
Arabidopsis IsiRNA, lsiRNA4g34070, which was gener- 
ated from the exon-intron junction of a NAT pair. Simi- 
lar to AtlsiRNA-1, lsiRNA4g34070 was processed by 
DCL1. 
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The extent of transcript overlap within each ds-NAT 
may differ from the TAIR annotations and requires 
careful examinations. For example, recent 3' rapid 
amplification of cDNA ends (RACE) experiments 
revealed that the major SR05 transcripts (data not 
shown) are shorter than previously indicated [17], 
although the experiments also showed a longer SR05 
transcript that extends beyond the detected nat-siRNAs 
(GenBank accession JQ513374). Some genes such as 
AG03 were annotated in the TAIR database as having 
no 3'-UTRs, but longer 3'-UTRs for some of these genes 
have been recovered by 3'-RACE in our labs and others. 
It appears that many genes have multiple forms of tran- 
scripts with various lengths of 3'-UTRs. 

Plant nat-siRNAs have been shown to regulate gene 
expression in stress responses or under specific develop- 
mental stages [16,17]. In this study, we identified new 
nat-siRNAs that appeared to regulate the expression of 
their NAT mRNAs, which may contribute to gene 
expression reprogramming and fine-tuning in response 
to environmental stresses. NATs are widespread in 
eukaryotic organisms [19,69-72]. The expression of anti- 
sense transcripts may differ among different cell types 
or under different environmental conditions [19,73]. It 
is likely that different sets of nat-siRNAs may be gener- 
ated from NATs in response to different developmental 
and environmental cues, and therefore play a broad reg- 
ulatory role in gene expression under specific condi- 
tions. Several reported nat-siRNAs, such as nat- 
siRNASR05, nat-siRNAATGB2 and nat-siRNAKPL, are 
induced under stress or appear at specific developmental 
stages and suppress the expression of their antisense 
transcripts post-transcriptionally [14,16-18,69]. Similar 
examples have also been observed in animal systems, for 
example, the siRNAs from the SIC34al gene pair are 
only generated from mouse kidney and testis [67]. 
Moreover, many cis-NAT gene pairs in mammalian 
brain, including the genes involved in Alzheimer's dis- 
ease and LRRTM1/Ctnna2, a pair of a's-NAT genes that 
participates in schizophrenia, can generate nat-siRNAs 
[74]. A lot of these nat-siRNAs are up-regulated in 
olfactory discrimination training [74]. Therefore, most 
of the reported nat-siRNAs are generated under specific 
developmental or environmental conditions. In this 
study, we detected bacterial-induced nat-siRNAATGB2 
[16], as well as another siRNA that is in phase with nat- 
siRNAATGB2 (Figure 3a; Additional file 10) by deep 
sequencing, whereas the developmental stage-specific 
nat-siRNA nat-siRNAKPL was not present in our data- 
base [14]. The salt-induced nat-siRNAs from the SR05- 
P5CDH pair were also present in this dataset, but the 
number of reads was below our cutoff threshold (mini- 
mum ten reads per million total match reads in the 
overlapping region), so it was not included in our list. In 



order to be consistent with other abiotic stress treat- 
ment, the salt treatment condition in this study was dif- 
ferent from that in the published work [17]. However, 
more SR05-P5CDH siRNAs were detected from two 
other salt-treated small RNA libraries in a separate 
study (Additional file 15; Gene Expression Omnibus 
(GEO) accession number GSE33642). It is worth noting 
that our genome-wide analysis on nat-siRNA-targeting 
NAT transcripts used the microarray result generated 
from healthy inflorescences (GEO accession number 
GSE2473), which may miss some NAT transcripts that 
are regulated by siRNAs at a different developmental 
stage or under specific environmental conditions. In 
addition, we cannot rule out the possibility that some of 
the nat-siRNAs regulate the expression of their targets 
by translational inhibition and therefore cannot be 
directly detected at the mRNA level by microarray or 
real-time RT-PCR experiments. 

Conclusion 

Our studies of plant sRNAs showed that those derived 
from plant cz's-NATs were enriched in their overlap 
regions and cw-NATs are more likely to generate 
sRNAs than other transcripts. Taken together with pre- 
viously published data on plants, Drosophila and yeast, 
our results strongly support the existence of nat-siRNAs 
in eukaryotic organisms. Our genetic analysis indicated 
that c/s-NATs were processed by DCL1 and/or DCL3. 
By analyzing a large number of small RNA libraries 
from stress-challenged plants, we found that some nat- 
siRNAs were likely to respond to different environmen- 
tal stresses, and therefore contributed to plant stress 
resistance, development or other cellular processes by 
regulating corresponding NAT mRNAs. nat-siRNA- 
mediated gene regulation is one of the regulatory 
mechanisms for at least a subgroup of ds-NATs. 

Materials and methods 

Plant materials 

Arabidopsis thaliana mutants (dcll-9 fwf2, dcll-7 fwf2, 
dcl2-l, dcl3-l, dcl4-2, henl, rdr2-l, rdrl-1, rdr6-15, 
PolIV-3 and PolV-11, Jwf2) and their corresponding WT 
ecotypes were used in this study. Bacterial infection 
assays were performed on 4-week-old Arabidopsis 
grown at 23°C with 12 h light as described previously 
[28-30]. Abiotic stress assays were performed on 4- 
week-old Arabidopsis gll plants expressing Luciferase 
under the control of RD29 promoter grown at 23°C and 
16 h light, and 3-month-old rice plants (Oryza sativa cv. 
japonica) grown at 28°C with 13 h light. 

Small-RNA library construction and deep sequencing 

Bacteria infiltration was carried out on the leaves of 4- 
week-old Arabidopsis WT and mutant plants as 
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described previously [46]. Briefly, leaves of 4-week-old 
Arabidopsis plants growing at 23°C with 12 h light were 
infiltrated with mock (10 mM MgC^), non-pathogenic 
strain Pst DC3000 hrcC , Pst DC3000 (empty vector) 
and Pst DC3000 {avrRpt2) at a concentration of 2 x 10 7 
cfu/ml. The leaves were collected at 6 and 14 hours 
post-inoculation. 

For abiotic stress treatment in Arabidopsis, plants 
grew at 23°C with 16 h light for 19 days, and were then 
divided into four groups. One of the groups was water- 
deprived (drought treatment) while the others were 
maintained with regular irrigation. At day 29, one group 
of plants was treated with salt solution (200 mM NaCl 
as irrigation solution for 24 h). Another group of plants 
was transferred to a growth chamber at 5°C and 16 h 
light for 24 h for cold treatment. The fourth group of 
plants was always maintained under regular conditions 
as a normal control. At day 30, all four groups of plants 
were collected. For small RNA library construction, the 
whole shoots of plants (stems, leaves and inflorescences) 
were used. 

Three-month-old rice growing at 28°C and approxi- 
mately 13 h of natural light were treated with drought 
(water withholding for 3 days), cold (5°C for 24 h) and 
salt (400 mM NaCl for 24 h, added as irrigation solu- 
tion). The inflorescences of these plants were collected 
and used for sRNA library construction. 

sRNA extraction and library construction were carried 
out as described previously [75,76]. Briefly, total RNA 
was isolated from infiltrated leaves and fractionated on 
15% denaturing polyacrylamide gel. RNA molecules ran- 
ging from 18 to 26 nucleotides were excised and ligated 
to 5'- and 3'-RNA adaptors using T4 RNA ligase fol- 
lowed by RT-PCR and gel purification as described in 
the instructions from Illumina Inc. The small RNA 
libraries were sequenced by Illumina Inc. and UCR core 
facility. 

Northern blot analysis of siRNAs 

We used LNA probes for small RNA Northern blot ana- 
lysis to detect nat-siRNAs. A total of 100 \ig total RNA 
was separated on a 17% acrylimide gel. The blots were 
probed at 39°C with PerfectHyb Plus Hybridization Buf- 
fer (Sigma-Aldrich, Saint Louis, MO, USA). The 
sequences of the oligos were listed in Additional file 7. 

Quantitative RT-PCR analysis of small RNA targets 

For QRT-PCR analysis, 5 ug total DNasel-treated (Invi- 
trogen, Grand Island, NY, USA) RNA was used for 
synthesizing cDNA using Oligo dT and Superscript II 
(Invitrogen). Amplification of small RNA targets was 
carried out using a real-time PCR machine (MyiQ, Bio- 
Rad, Hercules, CA, USA). The sequences of primers 
used here are listed in Additional file 7. 



Pre-processing of deep sequencing data 

Raw sequence reads were parsed to remove 3'-adaptors 
using an in-house program. Unique sequences from 
Arabidopsis small-RNA libraries were mapped to the 
Arabidopsis chromosome sequences [77]. To produce 
candidate datasets of sRNAs, we removed sequencing 
reads that matched to rRNAs, tRNAs, snRNAs, snoR- 
NAs, transposons, retrotransposons, or repeats. rRNA, 
snRNA and snoRNA sequences were obtained from 
TAIR8 genome annotation, and tRNA sequences were 
downloaded from the Arabidopsis tRNA database [78] 
and TAIR [77]. Transposons, retrotransposons, and 
repeat sequences were downloaded from the TIGR v2 
Arabidopsis Repeat Database [79] and RepBase [80]. 

The rice genome and annotation were retrieved from 
the MSU Rice Genome Annotation Project V6.1 (RGAP 
6.1). Non-coding RNA (tRNA, rRNA, snoRNA and 
snRNA) were collected from fRNAdb [81] and the 
UCSC tRNA database. Transposons, retrotransposons 
and repeat sequences were based on transposable ele- 
ments of MSU RGAP 6.1 and RepBase [80]. 

Analysis of enrichment of endogenous sRNAs in the cis- 
NATs 

The enrichment of sRNAs in the overlapping regions of 
NATs was examined by the method outlined in [69]. 
The sRNAs in the combined libraries for Arabidopsis or 
rice were used here. Briefly, for an /-nucleotide genie or 
genomic region that gives rise to n sRNAs, we defined 
nil as the density of small RNA loci in this region. For 
each pair of cw-NATs, we counted the number of 
sRNAs mapping to the overlapping region, N ol and the 
total number of sRNAs matching the non-overlapping 
regions of the two genes, N g , and measured the length 
of the overlapping region, L a , and the total length of 
non-overlapping regions of the two genes, L g . Then we 
computed the density of small-RNA loci in the overlap- 
ping region, N 0 /L 0 , and that of the non-overlapping 
regions of the two NAT genes, N g /L g . The one-tail 
paired two-sample t-test was applied to the two-paired 
samples of siRNA density, No/L 0 and N g /L g . The average 
densities in the overlapping regions (A a ) and in the non- 
overlapping regions of the NAT genes {A g ) that spawn 
sRNAs were computed. The ratio AJA g was considered 
as the enrichment score. 

In order to explore whether cw-NATs were more 
likely to give rise to siRNAs than other protein-coding 
genes, we calculated the average densities in the over- 
lapping regions (A a ) of «'s-NATs. We then calculated 
the average densities A u of a set of randomly chosen 
regions from the other protein-coding genes, which 
have the same size as the c/s-NATs. Statistical signifi- 
cance was measured by a P-value computed as the fre- 
quency of the occurrences that A u was greater than A Q . 
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We applied this method to compare the siRNA density 
of convergent c/s-NATs with that of 3'-UTRs and the 
siRNA density of divergent c/s-NATs with that of 5'- 
UTRs. 

To analyze the regulation of nat-siRNAs under a spe- 
cific condition, the sRNA reads in a single library that 
matched 100% to the genome were normalized to one 
million. Then the normalized sRNA abundances were 
compared between different libraries. Furthermore, we 
calculated the maximum fold change among all treated 
samples compared to corresponding control conditions 
as a supplemental value (Additional files 10 and 11). 

Classification of nat-siRNAs 

To classify the c/s-NAT pairs into distributed and site- 
specific patterns, we first clustered sequencing reads 
mapped to regions of c/s-NATs. Starting from the first 
reads closest to the 5' end of a c/s-NAT, reads were 
clustered if the positions of their first nucleotides were 
within a ten-nucleotide long segment. Clusters with 
more than five reads were retained for further analysis. 
We calculated two metrics for each cis-NAT: 1) the 
number of small-RNA clusters and 2) the percentage of 
the number of reads within these clusters relative to the 
total number of reads mapped to the whole c/s-NAT. 
We then categorized a c/s-NAT to have a site-specific 
pattern if 1) it had no more than ten siRNA clusters 
and 2) the percentage of the reads within the clusters 
was greater than 50%; otherwise we categorized the cis- 
NAT to have a distributed pattern. The criterion of 10 
siRNA clusters was chosen based on the observation 
that the frequency of clusters of 84 Arabidopsis cis- 
NATs has a clear separation between 10 and above 
(Additional file 5a). The percentage criterion (more than 
50% sequence reads) was determined based on a visual 
exanimation of the relationship between the number of 
clusters and the frequencies of reads within the clusters 
in the 84 c/s-NATS, as plotted Additional file 5b. As 
shown in the figure, this relationship seemed to follow 
two distributions, one is a linear correlation represented 
by the line in Additional file 5b and the other is a group 
with no more than ten clusters that retain more than 
50% of the total sequence reads. The second distribution 
thus formed site-specific patterns of siRNAs. We thus 
used these two criteria because a site-specific pattern 
should have relatively more reads appearing in a rela- 
tively smaller number of clusters than a distributed 
pattern. 

Microarray data and data analysis 

The processed microarray data of dcll-7 and Ler WT 
Arabidopsis were downloaded from GEO accession 
number GSE2473 (samples GSM47014, GSM47015, 
GSM47016, GSM47017, GSM47018, GSM47019, 



GSM47023, GSM47024, GSM47025, GSM47026 and 
GSM47027). We identified differentially expressed genes 
using Rank Product [82]. The rationale behind Rank 
Product is that it is unlikely for a gene to be ranked in 
the top position in all replicates if the gene was not dif- 
ferentially expressed. It has been shown that Rank Pro- 
duct is less sensitive to noise and has a better 
performance than other methods when sample size is 
small [83]. We selected differentially expressed genes 
with a false discovery rate no greater than 0.05 for 1,000 
permutations. Differentially expressed probe sets were 
then mapped to corresponding genes according to the 
annotation of the Affymatrix ATH1-121501 chip. 

Analysis of association of AGOs and nat-siRNAs 

The processed AGO data for Arabidopsis were down- 
loaded from the GEO databases under accession num- 
bers GSE10036 and GSE12037. The number of AGO- 
associated nat-siRNAs was obtained by perfectly map- 
ping the identified nat-siRNAs to each of the AGO pull- 
down datasets separately. The number was then normal- 
ized by the total number of AGO reads in an AGO 
dataset that can be perfectly mapped to the genome 
raised to per million for normalization. The enrichment 
of nat-siRNAs in an AGO pull-down relative to the 
other AGOs was calculated as the percentage of the 
normalized nat-siRNAs in the AGO of the total normal- 
ized nat-siRNAs in all AGO pull-downs. 

Identification of nat-siRNA regulated c/s-NATs 

The nat-siRNA regulated c/s-NATs were identified from 
the set of differentially expressed genes from the micro- 
array data of the dcll-7 and Ler WT control Arabidopsis 
plants. The analysis was carried out based on two cri- 
teria: 1) more than 1.5-fold up-regulation with a P-value 
less than 0.05 in the dcll-7 mutant and 2) presence of 
nat-siRNAs produced by the c/s-NATs on the strands 
opposite to the NAT transcripts. The rationale behind 
these criteria is that the main function of DCLl-gener- 
ated nat-siRNAs is to target c/s-NATs on the opposite 
strands to down-regulate their expression. Thus, in a 
DCL1 mutant where no such nat-siRNAs are produced, 
the expressions of the targeted NATs are expected to be 
up-regulated. 

Additional material 



Additional file 1: Distributions of the sequencing reads from small 
RNA libraries of abiotic and biotic treated Arabidopsis and abiotic 
challenged rice. Shown in the table are the total number of raw 
sequencing reads (total), the number of qualified reads that can map 
perfectly to the corresponding Arabidopsis or rice genome (mapped), 
intergenic regions (intergenic), intronic sequences (introns), transposons 
or repeats (repeats/mobile), tRNA, rRNA, snoRNA and snRNA sequences 
(ncRNA), trans-acting siRNA (ta-siRNA), microRNA sequences (miRNAs), 
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intron-exon junctions (intron-exon junction) and exon-exon junctions 
(exon-exon junction). No mismatches were allowed in the mapping. The 
second number for each condition (column) is the percentage of reads 
relative to the total mapped reads. 

Additional file 2: (a-d) Distributions of the lengths (a, b) and the 
first nucleotides (c, d) of total siRNAs and nat-siRNAs in stress- 
challenged Arabidopsis and rice, (a) Length distributions of unique 
sequencing reads in Arabidopsis. The blue and red bars represent total 
siRNAs and nat-siRNAs, respectively, (b) Length distributions of unique 
sequencing reads in rice. The blue and red bars represent total siRNAs 
and nat-siRNAs, respectively, (c) First-nucleotide distribution of unique 
sequencing reads in Arabidopsis. (d) First-nucleotide distribution of 
unique sequencing reads in rice. 

Additional file 3: Normalized reads of siRNAs mapped to the 
overlapping regions of Arabidopsis and rice c/'s-NAT pairs, 
respectively (a) siRNAs mapped to the overlapping regions of 84 
Arabidopsis c/s-NAT pairs, (b) siRNAs mapped to the overlapping 
regions of 1 19 rice c/s-NAT pairs. Each small RNA library was normalized 
to one million reads with 100% matching to the genome. The sum of 
normalized reads from each library was listed for each c/s-NAT pair. 

Additional file 4: Arabidopsis c/'s-NATs with two classes of small 
RNAs mapped to the overlapping regions genel' and gene2' 
represent the first and second transcript in each c/s-NAT pair. Listed are 
84 Arabidopsis c/s-NATs with siRNAs mapped to the overlapping region. 
Raw reads of 20- to 22-nucleotide and 23- to 28-nucleotide classes of 
siRNAs matching 100% to the genome were analyzed. The percentage of 
20- to 22-nucleotide and 23- to 28-nucleotide siRNAs are presented. 

Additional file 5: (a) Distributions of the number of small RNA 
clusters in 84 Arabidopsis c/'s-NATs. The red line represents the 
separation between site-specific (left) and distributed (right) patterns, (b) 
The plot of two metrics of 84 c/s-NATs in Arabidopsis. Each dot 
represents the number of clusters within the c/s-NAT whole region and 
the percentage of small RNA reads in all clusters. The red dots in the 
rectangle were classified as site-specific patterns, whereas the blue dots 
were distributed patterns, represented by a linear correlated line. 

Additional file 6: Classification of c/s-NAT pairs We analyzed 84 
Arabidopsis and 1 19 rice c/s-NAT pairs that have more than 10 raw 
siRNAs mapped to the overlap region. The reads analyzed here are the 
raw sequencing reads mapped to the whole region from the 
combination of all libraries, (a) Arabidopsis c/s-NAT pairs display different 
distribution patterns, (b) Rice c/s-NAT pairs display different distribution 
patterns. 

Additional file 7: Oligos used in this study. A plus sign ('+') before the 
nucleotide depicts LNA residues. 

Additional file 8: Small RNAs mapped to the introns or intron-exon 
junction regions of Arabidopsis c/'s-NATs. Copy number indicates the 
number of raw reads in the combination of all libraries, (a) siRNAs 
mapping to introns in the overlapping region of c/s-NATs in Arabidopsis. 
(b) siRNAs mapping to introns in the whole region of c/s-NATs in 
Arabidopsis. 

Additional file 9: Small RNAs mapped to the introns or intron-exon 
junction regions of rice c/'s-NATs. Raw reads from all the libraries were 
analyzed, (a) siRNAs mapping to introns in the overlapping region of c/s- 
NATs in rice, (b) siRNAs mapping to introns in the whole region of c/s- 
NATs in rice 

Additional file 10: Normalized reads of siRNAs mapped to 84 
Arabidopsis NATs under different conditions. Each library was 
normalized to a million reads that perfectly mapped to the genome. The 
fold change with maximum absolute value between treated sample and 
corresponding control is listed as a supplemental value. The pairs with 
more than two-fold change were annotated in red (up) or in blue 
(down), (a) Reads of small RNAs mapped to overlapping region of NATs 
under different conditions, (b) Reads of small RNAs mapped to whole 
region of NAT under different conditions. 

Additional file 11: Normalized reads of siRNAs mapped to 119 rice 
NATs under different conditions. siRNAs matching 100% to the 
genome were analyzed and each library was normalized to a million 



reads before analyzing. The fold change with maximum absolute value 
between treated and untreated libraries is listed. The pairs with more 
than a two-fold change are displayed in red (up, +) or blue (down, -). (a) 
Reads of small RNAs mapped to overlapping region of NATs under 
different conditions, (b) Reads of small RNAs mapped to whole region of 
NATs under different conditions. 

Additional file 12: Arabidopsis c/s-NATs that are up-regulated in the 
dcll-7 mutant, (a) List of probes of the 93 genes of the 84 c/s-NAT pairs. 
Annotations of the Affymetrix ATH 1 -1 21 501 chip were used and no 
probes were found for the rest of the c/s-NAT genes, (b) The 23 c/s-NAT 
genes up-regulated in dcll-7. 

Additional file 13: Expression analysis of NAT transcripts in the fwf2 
single mutant. The expression of NAT transcripts was analyzed by 
quantitative RT-PCR. Total RNA (5 ug) was used for DNase treatment and 
reverse transcription. Error bars indicate the technical replicates and 
similar results were obtained from two biological repeats. 

Additional file 14: Association of 20- to 22-nucleotide and 23- to 
26-nucleotide classes of nat-siRNAs with different AGOs in 
Arabidopsis Unique reads of nat-siRNAs associated with AGOl, AG02, 
AG04 and AG07 were analyzed. Loaded reads represent unique reads of 
nat-siRNAs that were loaded into distinct AGOs. The percentage in total 
nat-siRNA-AGO libraries indicates the distribution of loaded nat-siRNAs in 
different AGOs. 

Additional file 15: SR05-P5CDH siRNAs derived from salt and cold 
stress challenged Arabidopsis The siRNAs were identified from GEO 
database accession number GSE33642. (a) Reads of siRNAs positively/ 
negatively match to At5G62520 (SR05) in the overlap/non-overlap 
region. The '+ strand' and '- strand' indicate positively or negatively 
matching, (b) Distribution pattern of SR05-P5DCH siRNAs. siRNAs 
positively or negatively matched to At5G62520 are displayed above or 
below the SR05-P5DCH gene pair. Exons and introns are represented by 
red and blue dishes, respectively. The overlapping region is indicated by 
the two green lines. 



Abbreviations 

AGO: Argonaute; c/s-NAT: c/s-natural antisense transcript; DCL: Dicer-like; 
dsRNA: double-stranded RNA; endo-siRNA: endogenous siRNA; GEO: Gene 
Expression Omnibus; hc-siRNA: heterochromatic siRNA; IR: inverted repeat; 
LNA: locked nucleic acid; IsiRNA: long siRNA; miRNA: microRNA; PCR: 
polymerase chain reaction; Pol: RNA polymerase; Pst: Pseudomonas Syringae 
pv tomato; RACE: rapid amplification of cDNA ends; RDR: RNA-dependent 
RNA polymerase; siRNA: small interfering RNA; snRNA: small nuclear RNA; 
snoRNA: small nucleolar RNA; sRNA: small RNA; ta-siRNA: trans-acting siRNA; 
UTR: untranslated region; WT: wild type. 
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